Grouping and Categorization of Documents in Relativity Measure
نویسندگان
چکیده
This paper presents a spectral clustering method called correlation through preserving indexing (CPI), which is to perform in the correlation similarity measure space. The documents are considered into a low dimensional semantic space, the correlations between the documents in the local patches are maximized and correlations between the documents outside these patches are minimized. The intrinsic structure of the document space is included in the similarities between the documents. Correlation is the similarity measure for finding the intrinsic structure of the document space than Euclidean distance. Simultaneously, the proposed CPI methods can effectively finding the intrinsic structures included in high-dimensional document space. The effectiveness of the new method is implemented by extensive experiments conducted on various data sets and by comparison with existing document clustering methods. Key Terms: Document Clustering, Correlation Latent Semantic Indexing, Dimensionality Reduction, Correlation Measure.
منابع مشابه
General and Professional Qualifications of Iranian High School Masters according to the Ministerial Documents
General and Professional Qualifications of Iranian High School Masters according to the Ministerial Documents H. Abdollaahi, Ph.D. A perusal of the ministerial documents at the Iranian Ministry of Education during the past 100 years was undertaken in order to clarify how the general and professional qualifications required for the post of high school master have evolved over the c...
متن کاملConcept Based Categorization of Documents for Search Engines
Now days, information retrieval is a challenging work for search engines. In this paper we will discuss about text categorization. Text documents categorization is the process to classify documents according to some predefined knowledge. Documents with same concept are grouped together, and documents with different concept are formed other group according to their similarity of context of the d...
متن کاملArabic Text Categorization Algorithm using Vector Evaluation Method
Text categorization is the process of grouping documents into categories based on their contents. This process is important to make information retrieval easier, and it became more important due to the huge textual information available online. The main problem in text categorization is how to improve the classification accuracy. Although Arabic text categorization is a new promising field, the...
متن کاملStudy of buffer effects on the grouping efficacy measure of stochastic cell formation problem
This paper deals the stochastic cell formation problem (SCFP). The paper presents a new nonlinear integer programming model for the SCFP in which the effect of buffer size on the grouping efficacy of cells has been investigated. The objective function is the maximization of the grouping efficacy of cells. A chance constraint is applied to explore the effect of buffer on the SCFP. Processing tim...
متن کاملNew Methods for Text Categorization Based on a New Feature Selection Method and a New Similarity Measure Between Documents
In this paper, we present a new feature selection method based on document frequencies and statistical values. We also present a new similarity measure to calculate the degree of similarity between documents. Based on the proposed feature selection method and the proposed similarity measure between documents, we present three methods for dealing with the Reuters-21578 top 10 categories text cat...
متن کامل